Multiclass SVM Loss:
Linear Classification:
Matrix multiply: stretch x to a one-dimension vector,W is a matrix.
Multiclass SVM Loss:
Let be scores,then the SVM scores has the form:
is the correct label’s score,while is the wrong label’s scores. When is larger than
,that means it contributes to the loss,so that is greater than .
Characteristics: 1.When give the a little bit change,the Loss function will not change. Because after change, is still 1 more than the wrong label’s scores.
min possible : 0 max:
When all scores are small random values,loss is () where C stands for the number of categories.
Regularization
The most common regularization: L2-norm
Why we need that?:
-
Express preferences in among models beyond “minimize training error”,allow people to integrate their wisdom and knowledge they’ve already obtained.
-
Avoid overfitting
Example:
It’s obvious that
L2-norm regularization prefer more balanced matrix,which is in this example. This implies that use as many functions as possible in this preference.”spread out the weights”
prefer simple models: Occam’s Razor reveals the truth that simplicity is much preferred.
Cross Entropy Loss
SoftMax function:
| cat | 3.2 | 24.5 | 0.13 |
|---|---|---|---|
| car | 5.1 | 164.0 | 0.87 |
| frog | -1.7 | 0.18 | 0.00 |
unnormalized log-prob/logits —exp—> unnormalized prob —normalize—>probabilities
Maximum Likelihood Estimation
min possible loss:0 (it can only approach to 0 but never truly reach) max:
When all scores are small random values,loss is where C stands for the number of categories.